Network Analysis with the Enron Email Corpus

نویسندگان

  • Johanna Hardin
  • Ghassan Sarkis
  • P. C. Urc
چکیده

We use the Enron email corpus to study relationships in a network by applying six different measures of centrality. Our results came out of an in-semester undergraduate research seminar. The Enron corpus is well suited to statistical analyses at all levels of undergraduate education. Through this article’s focus on centrality, students can explore the dependence of statistical models on initial assumptions and the interplay between centrality measures and hierarchical ranking, and they can use completed studies as springboards for future research. The Enron corpus also presents opportunities for research into many other areas of analysis, including social networks, clustering, and natural language processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantifying and Comparing Centrality Measures for Network Individuals as Applied to the Enron Corpus

The ever increasing body of social networks creates an opportunity for extensive network analysis and investigations of communications, cliques, and network contributions. In this study, we focus our attention on the Enron email corpus and the corresponding network of employees, attempting to gather information from the email communications. Methods of data reduction on the email corpus were us...

متن کامل

Work Hard, Play Hard: Email Classification on the Avocado and Enron Corpora

In this paper, we present an empirical study of email classification into two main categories “Business” and “Personal”. We train on the Enron email corpus, and test on the Enron and Avocado email corpora. We show that information from the email exchange networks improves the performance of classification. We represent the email exchange networks as social networks with graph structures. For th...

متن کامل

Social Network Analysis and Organizational Disintegration: The Case of Enron Corporation

Email networks in contemporary organizations are fairly representative of the underlying communications networks. We show that changes in communication networks have implications for studying organization disintegration. In this paper, we analyzed the changing communication network structure at Enron Corporation during the period of its disintegration (2000-2001). Our goal was to understand how...

متن کامل

Enron Emails as Graph Data Corpus for Large-scale Graph Querying Experimentation

In this paper we describe Enron email corpus in graph/network data format. Nodes of the graph are emails connected with named entities (NE) extracted from text like people, email addresses, telephone numbers. Edges are links between NE representing concurrence in same email part, paragraph, sentence or composite NE. Enron Graph corpus contains a few millions of nodes and it is quite large corpu...

متن کامل

An Email Attachment is Worth a Thousand Words, or Is It?

There is an extensive body of research on Social Network Analysis (SNA) based on the email arhive. The network used in the analysis is generally extracted either by capturing the email communication in From, To, Cc and Bcc email header elds or by the entities contained in the email message. In the latter case, the entities could be, for instance, the bag of words, url’s, names, phones, etc. It ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1410.2759  شماره 

صفحات  -

تاریخ انتشار 2014